Identifying Gene Function Descriptions by Probability-based Sentence Selection
نویسندگان
چکیده
This paper proposes an approach to the secondary task in the TREC Genomics Track. We regard the task as identification of the sentences describing gene functions (i.e., GeneRIFs) and propose a method considering two factors: topicality and relevance. The former refers to the topicality of a sentence and is measured based on location information and word frequencies in the article. The latter refers to the relevance as a GeneRIF based on the vocabulary used in the article. We formalize a probabilistic model combining these features. Our method is evaluated on the test set of 139 MEDLINE abstracts, and the results demonstrate that (a) function words in input could help to identify gene function descriptions and that (b) there is a vocabulary peculiar to GeneRIFs and that (c) location information shows the highest predictive power for this particular task despite its simplicity. Additionally, we examine some alternative methods in comparison with our method.
منابع مشابه
Novel Radial Basis Function Neural Networks based on Probabilistic Evolutionary and Gaussian Mixture Model for Satellites Optimum Selection
In this study, two novel learning algorithms have been applied on Radial Basis Function Neural Network (RBFNN) to approximate the functions with high non-linear order. The Probabilistic Evolutionary (PE) and Gaussian Mixture Model (GMM) techniques are proposed to significantly minimize the error functions. The main idea is concerning the various strategies to optimize the procedure of Gradient ...
متن کاملDevelopmental stages of perception and language acquisition in a perceptually grounded robot
The objective of this research is to develop a system for language learning based on a ‘‘minimum’’ of pre-wired language-specific functionality, that is compatible with observations of perceptual and language capabilities in the human developmental trajectory. In the proposed system, meaning (in terms of descriptions of events and spatial relations) is extracted from video images based on detec...
متن کاملEnergy Detection of Unknown Signals over Composite multipath/shadowing Fading Channels
In this paper, the performance analysis of an energy detector is exploited over composite multipath/shadowing fading channels, which is modeled by Rayleigh-lognormal (RL) distribution. Based on an approximate channel model which was recently proposed by the author, the RL envelope probability density function (pdf) is approximated by a finite sum of weighted Rayleigh pdfs. Relying on this inter...
متن کاملOn A Semantic Model For Multi-Lingual Paraphrasing
The aim of the present paper is to formalize semantic-directed lexical selection by virtue of frame-based semantic inference capability built in the CFL representation language. The DG model of paraphrasing semantic descriptions can explicate logical process of knowledge-based sentence generation excluding any particular procedures for lexical selection or syntax structure generation. In additi...
متن کاملSaliency Cognition of Urban Monuments Based on Verbal Descriptions of Mental-Spatial Representations (Case Study: Urban Monuments in Qazvin)
Urban monuments encompass a wide range of architectural works either intentionally or unintentionally. These works are often salient due to their inherently explicit or hidden components and qualities in the urban context. Therefore, they affect the mental-spatial representations of the environment and make the city legible. However, the ambiguity of effective components often complicates their...
متن کامل